The utility of different representations of protein sequence for predicting functional class

نویسندگان

  • Ross D. King
  • Andreas Karwath
  • Amanda Clare
  • Luc Dehaspe
چکیده

MOTIVATION Data Mining Prediction (DMP) is a novel approach to predicting protein functional class from sequence. DMP works even in the absence of a homologous protein of known function. We investigate the utility of different ways of representing protein sequence in DMP (residue frequencies, phylogeny, predicted structure) using the Escherichia coli genome as a model. RESULTS Using the different representations DMP learnt prediction rules that were more accurate than default at every level of function using every type of representation. The most effective way to represent sequence was using phylogeny (75% accuracy and 13% coverage of unassigned ORFs at the most general level of function: 69% accuracy and 7% coverage at the most detailed). We tested different methods for combining predictions from the different types of representation. These improved both the accuracy and coverage of predictions, e.g. 40% of all unassigned ORFs could be predicted at an estimated accuracy of 60% and 5% of unassigned ORFs could be predicted at an estimated accuracy of 86%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

iProsite: an improved prosite database achieved by replacing ambiguous positions with more informative representations

PROSITE database contains a set of entries corresponding to protein families, which are used to identify the family of a protein from its sequence. Although patterns and profiles are developed to be very selective, each may have false positive or negative hits. Considering false positives as items that reduce the selectiveness of a pattern, then, the more selective pattern we have, a more accur...

متن کامل

On the $c_{0}$-solvability of a class of infinite systems of functional-integral equations

  In this paper, an existence result for a class of infinite systems of functional-integral equations in the Banach sequence space $c_{0}$ is established via the well-known Schauder fixed-point theorem together with a criterion of compactness in the space $c_{0}$. Furthermore, we include some remarks to show the vastity of the class of infinite systems which can be covered by our result. The a...

متن کامل

Functional Assessment of CODM Gene in Different Cultivar of Papaveraceous Species Via In Silico Analysis

Medicinal use of the opium poppy (Papaver somniferum L) has ancient history, but the isolation of morphine was not described until the early nineteenth century. Morphine is the most important alkaloid of opium poppy in the last 50 years. In the morphine pathway has been reported to generate morphine in this species, CODM has a crucial role as the gene coding the enzyme respons...

متن کامل

Functional Assessment of CODM Gene in Different Cultivar of Papaveraceous Species Via In Silico Analysis

Medicinal use of the opium poppy (Papaver somniferum L) has ancient history, but the isolation of morphine was not described until the early nineteenth century. Morphine is the most important alkaloid of opium poppy in the last 50 years. In the morphine pathway has been reported to generate morphine in this species, CODM has a crucial role as the gene coding the enzyme respons...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 17 5  شماره 

صفحات  -

تاریخ انتشار 2001